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This publication describes the Fortran iv mathemati- 
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Fortran iv or map programmer who wishes to use 
them. 

The first part of this publication deals primarily with 
the rules for calling Fortran iv mathematical subrou- 
tines in either a Fortran iv or map program. The 
second part details the mathematical principles from 
which these subroutines have been constructed. Also 
included are performance statistics indicating the speed 
and accuracy of the subroutines. 

The reader should be familiar with one of the follow- 
ing publications, depending upon the programming 
language with which he works: 
IBM 7040/7044 Operating System (16/32K): FOR- 
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IBM 7040/7044 Operating System (16/32K): Macro 
Assembly Program (MAP) Language, Form C28- 
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FORTRAN IV Mathematical Subroutine Library 



Introduction 

The Fortran iv mathematical subroutine library con- 
tains three types of subroutines: single-precision, 
double-precision, and complex. These subroutines may 
be used by a Fortran iv or a map programmer to per- 
form mathematical computations. This publication pro- 
vides the information required by a Fortran iv or map 
programmer who wishes to use these subroutines. 



Calling Sequences 

Each subroutine in the library provides one or two 
mathematical functions. Each function is identified by 
a unique entry point. For this reason, the name of each 
subroutine is distinct from the name of its entry 
point(s). 

The method used to call a specific function depends 
upon the programming language used (i.e., Fortran iv 
or map). In each case, however, the programmer must 
specify an entry point and one or more arguments. 
(An argument is the name of a location that contains 
the value to be supplied as input to a function.) Figure 
1 shows the general form by which a mathematical 
function is called in each language. The specific form 
for calling each function is shown in Figures 2 through 
8 of the section "fortran iv Mathematical Subroutines." 



Arguments and Answers 

The arguments of most of the functions provided by 
the subroutines must be normalized floating-point num- 
bers. Only the mtn, xpi, xp2, and fdxi subroutines 
differ in this respect. These four subroutines, along 
with xp3, fdx2, and fca, require calling sequences that 
differ from the general forms shown in Figure 1. These 



exceptions are given in Figures 2 through 8 of the sec 
tion, "fortran iv Mathematical Subroutines." 

Double-Precision Arguments 

A double-precision argument consists of two adjacent 
words. The location of the first word is considered to 
be the location of the entire argument. The first word is 
the high-order part of the double-precision number, 
while the second word is the low-order part, map pro 
grammers should note that the location of the high 
order part must be an even-numbered storage loca- 
tion if the processing unit they use is equipped with 
the double-precision instruction set. This restriction 
does not apply if double -precision operations are simu- 
lated by the Floating-Point Trap Subroutine. 

Complex Arguments 

A complex argument consists of two adjacent words. 
The first word contains the real part of the complex 
argument, while the second word contains the imagi 
nary part. The location of the real part is considered 
to be the location of the entire complex argument. 

Answers 

Each function produces a single answer. For fortran 
iv programs, the answer is stored in the leftmost vari- 
able of the arithmetic assignment statement that was 
used to call the function ( see Figure 1 ) . For map pro- 
grams, upon return to the calling program, the answer 
is found either in the ac register (for single-precision 
functions), or in the ac and mq registers (for double- 
precision and complex functions). More specifically, 
for double-precision functions, the high-order part of 
the answer is stored in the ac, and the low-order part 
in the mq; for complex functions, the real part is stored 
in the ac, and the imaginary part is stored in the mq. 



Source 
Language 


Calling Sequences For Functions 
that Require One Argument 


Calling Sequences for Functions 
that Require Two Arguments 


FORTRAN IV 


y = entry point (argument) 
answer stored in y 


y = entry point (argl, arg2) 
answer stored in y 


MAP 


CALL entry point (argument) 
answer left in AC or AC-MQ 


CALL entry point (argl, arg2) 
answer left in AC or AC-MQ 



Figure 1. General Form of Calling Sequences 
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Role of the Execution Error Monitor 

The nature of the functions and the nature of the ma- 
chine registers used in computations impose certain 
limits on the range of input arguments for most func- 
tions. Figures 2 through 8 of the section, "fortran iv 
Mathematical Subroutines" give the valid argument 
range for each function. 

Any function that places a limitation on the range of 
arguments it will accept also provides the user with an 
option for handling invalid arguments. These options 
are called optional returns and each type of optional 
return is identified by an error code. Thus, for each of 
the 28 optional returns, there is a specific error code. 
(The optional return and error code for each function 
are given in Figures 2 through 8 of the section, "For- 
tran iv Mathematical Subroutines.") 

Whenever an invalid argument is detected by a func- 
tion, the function immediately transfers control to the 
Executor Error Monitor (i.e., subroutine xem — see the 
publication IBM 7040/7044 Operating System (16/ 
32K): System Programmer's Guide, Form C28-6339, 
for information about xem). The xem subroutine then 
checks the option control bits (i.e., bits 1 through 28) 
of matop., a one- word control section within xem. The 
position of each option control bit corresponds to an 
error code ( and, thus, a specific optional return ) ; that 
is, bit 1 corresponds to error code 1, bit 2 corresponds 
to error code 2, etc. If the option control bit corre- 
sponding to the error code of the function that called 
xem is set to 1, then the optional return for that par- 
ticular function is to be taken. The xem subroutine 
prints a warning message to this effect and execution 
continues with the optional return having been taken. 
However, if the option control bit is set to 0, then the 
optional return for that particular function is not to be 
taken and execution is to be terminated. The xem sub- 
routine prints error messages for both the user and the 
machine operator and then terminates execution. 

In the distributed version of xem, all the option con- 
trol bits in matop. are set to 1. Thus, if these control 
bits are not reset, all cases of invalid arguments will re- 
sult in the taking of the appropriate optional return. 
Termination of execution because of an invalid argu- 
ment can result only for those functions whose option 
control bits have been set to 0. This resetting can be 
accomplished in either of two ways: 

1. It can be done during execution by a map program 
which reassembles the matop. control section. How- 
ever, the resetting is effective only for that particular 
application. 

2. It can be done prior to execution by reassembling 
xem so that the configuration of the option control bits 
would permanently meet the user's requirements. 

Whenever an optional return is taken, a conventional 



answer is returned to the user. Among the conventional 
answers given for invalid arguments is the largest pos- 
sible floating-point number, 2 127 — 2 100 in single-pre- 
cision cases, or 2 127 — 2 73 in double-precision cases. For 
the sake of brevity, these numbers are written as omega 
(i.e., Q,) in Figures 2 through 8 of the section, "fortran 
iv Mathematical Subroutines." 



Floating-Point Trap Subroutine 

A floating-point trap occurs whenever the ac or mq, or 
both, reach an overflow or underflow condition. 

Floating-Point Overflow 

Except for those subroutines that deal with complex 
numbers, floating-point overflow can never occur be- 
cause the arguments are screened upon entry to a sub- 
routine. If an argument is one that could cause over- 
flow, it is immediately treated as an invalid argument 
and control passes to the Execution Error Monitor. 

Floating-point overflow can occur during some of 
the complex subroutines when an answer approaches 
the overflow threshold even though both the real and 
imaginary parts of the complex argument lie within the 
valid range. Such is the case with those subroutines for 
which the cost ( in space and time ) of an effective pre- 
screening of arguments would be so excessive as to 
hinder seriously the efficiency of the subroutine. 

An occurrence of such an overflow causes a floating- 
point trap. The Floating-Point Trap Subroutine (i.e., 
system subroutine fpt) then sets the ac and the mq ac- 
cording to the trap condition. (See the publication 
IBM 7040/7044 Operating System (16/32K): System 
Programmers Guide, Form C28-6339, for information 
about trap conditions. ) After the ac and mq have been 
set, fpt prints an overflow message and then returns 
control to the subroutine from which the trap origi- 
nated. 

Floating-Point Underflow 

Any subroutine can cause floating-point underflow. An 
occurrence of underflow causes a floating-point trap. 
The fpt subroutine then sets the ac and mq according 
to the trap condition; after the ac and mq have been 
set, fpt returns control to the subroutine from which 
the trap originated. 

Double-Precision Simulation 

If a 7040/7044 System is equipped with the double- 
precision instruction set, the Subroutine Library will 
take full advantage of it. However, for a 7040/7044 
System that does not have the double-precision instruc- 
tion set, the simulation of double-precision operations 
is performed by the fpt subroutine as described below. 



A floating-point trap occurs whenever a double-pre- 
cision instruction is encountered during execution. 
After the trap occurs, the fpt subroutine simulates the 
double-precision operation and then returns control to 
the point at which the trap occurred. 

Note: If double-precision simulation is not required, 
the user may delete that part of fpt that provides the 
simulation capability. The publication IBM 7040/7044 
Operating System (16/32K): System Programmer's 
Guide, Form C28-6339, gives the details for the dele- 
tion procedure. 



Evaluating Accuracy 

Because the size of each machine word is limited, small 
errors may be generated by mathematical subroutines. 
In an elaborate computation, slight inaccuracies can 
accumulate to become larger errors. Thus, in interpret- 
ing final results, the user should take into account any 
errors introduced during the various intermediate steps. 
For a detailed discussion of errors, see Appendix A, 
"Algorithms and Accuracy Considerations." 



Fortran IV Mathematical Subroutines 

The Fortran rv mathematical subroutine library is 
summarized in the figures that make up the rest of this 
section. The figures are organized as follows: 

Figure 2. Single-Precision Exponential Subroutines 
Exponential (xpn) 

Exponential, fixed-point base and exponent (xpi ) 
Exponential, floating-point base, fixed-point expo- 
nent (XP2) 
Exponential, floating-point base and exponent 

(XP3) 

Square Root (sqr) 
Figure 3. Single-Precision Trigonometric Subroutines 
Sine/Cosine (scn) 
Tangent/Cotangent (tnct) 



Arctangent (atn) 

Arcsine/Arccosine (arscn) 
Figure 4. Single-Precision Miscellaneous Routines 

Logarithm (log) 

Hyperbolic Sine/Cosine (scnh) 

Hyperbolic Tangent ( tnh ) 

Error Function ( erf ) 

Gamma/Loggamma (gama) 
Figure 5. Double-Precision Exponential Subroutines 

Exponential (fdxp) 

Exponential, with floating-point base and fixed ex- 
ponent (fdxi; entry point dxpi. ) 

Exponential, with double-precision base and sin- 
gle- or double-precision exponent (fdx2) 

Square Root (fdsq) 
Figure 6. Double-Precision Trigonometric and Log- 
arithmic Subroutines 

Logarithm (fdlg) 

Sine/Cosine (fdsc) 

Arctangent (fdat) 
Figure 7. Complex Subroutines 

Square Root (fcsq) 

Exponential (fcxp) 

Exponential, with complex base and fixed expon- 
ent (fdxi; entry point cxpi.) 

Logarithm (fclg) 

Sine/Cosine (fcsc) 

Absolute Value (fcab) 

Arithmetic (fca) 
Figure 8. Other Subroutines 

mtn Routine 

Modular Function (fdmd) 

Note: Some of the ranges in these figures are repre- 
sented in powers of 2. Listed below are these values 
and the approximate decimal value corresponding to 
each. 

2" 127 & 5.878 X 10" 89 
2 20 s~ 1.049 X 10 6 
2* es 3.355 x 10 7 
2 50 s 1.126 X 10 15 
2 120 si 1.663 X 10 3 * 
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Subroutine 
Name 


Entry 
Point 


Definition 


Calling Sequence 


Valid Argument 
Range 


Error 
Code 


Options 


If Argument 
Range Is 


Then the 
Answer Is 


XPN 

(exponential) 


EXP 


y = e* 


FORTRAN IV 
y = EXP(x) 

MAP 

CALL EXP(x) 


x ^ 88.029692 


8 


x > 88.029692 


fi 


XPl 

(exponential, 
fixed-point base, 
m, and fixed-point 
exponent, n) 


.EXPl. 


y = m° 


FORTRAN IV 
y = m**n 

MAP 

CLA m 
LDQ n 
CALL .EXPl. 


m =^0 
n^O 


1 


m =1 n = 





2 


m = 
n <0 





XP2 

(exponential, 
floating-point base 
x, and fixed-point 
exponent n) 


.EXP2. 


y = x» 


FORTRAN IV 
y =1 x**n 

MAP 

CLA x 
LDQ n 
CALL .EXP2. 


x ^=0 


1 


x = n = 





2 


x = 
n <0 





XP3 

(exponential, 
floating-point base 
xi, and floating- 
point exponent X2) 


.EXP3. 


y = xi x 2 


FORTRAN IV 

y = xi**x 2 

MAP 

CLA Xi 
LDQ x 2 
CALL .EXP3. 


X!>0 


1 


Xl = X2 = 





2 


X 1= 
X2<0 





7 


X!<0 
X2 7^0 


M* 


SQR 

(square root) 


SQRT 


y = xi 


FORTRAN IV 
y = SQRT(x) 

MAP 

CALL SQRT(x) 


x^O 


14 


x < 


|x|l 



• Figure 2. Single-Precision Exponential Subroutines 



Subroutine 
Name 


Entry 
Point 


Definition 


Calling Sequence 


Valid Argument 
Range 


Error 
Code 


1 

Options \ 


If Argument 
Range Is 


Then the 
Answer Is 


SCN 

(sine and cosine 
functions, where 
argument x is 
expressed in radians) 


SIN 


y r= sine(x) 


FORTRAN IV 

y = SIN(x) 

MAP 

CALL SIN(x) 


| x | < 2 s5 


12 


| x | ^ 2 s5 





COS 


y zr: cosine(x) 


FORTRAN IV 
y = COS(x) 

MAP 

CALL COS(x) 


TNCT 

(tangent and co- 
tangent functions, 
where argument x 
is expressed in 
radians} 


TAN 


y = tan(x) 


FORTRAN IV 
y = TAN(x) 

MAP 

CALL TAN(x) 


| x | < 2 20 and x may 
not be an odd 
integral multiple 
of 7r/2 (see Note) 


3 


| x | ^ 2 s0 





4 


x-kJL 

2 

where k is an 
odd integer 


n 


COTAN 


y = cot(x) 


FORTRAN IV 
y = COTAN (x) 

MAP 

CALL COTAN (x) 


| x | < 2 20 and x may 
not be a multiple 
of ir (see Note) 


3 


|x|=*2» 





4 


X = k 7T 

where k is 
an integer 





ATN 

(arctangent 
functions, where 
the result / is in 
radians) 


ATAN 


y = arctan(x) 


FORTRAN IV 

y = ATAN(x) 

MAP 

CALL ATAN(x) 


Any argument 


not 
applicable 


not 
applicable 


not 
applicable 


ATAN2 


y = arctan ( — ) 

\X2/ 


FORTRAN IV 

y = ATAN2(xi,x 2 ) 

MAP 

CALL ATAN2(x a ,x 2 ) 


(xi,x 2 ) *£ (0,0) 


9 


(x!,x 2 ) = (0,0) 





ARSCN 

(arcsine and 
arccosine functions, 
where the result y 
is in radians) 


ARSIN 


y = arcsin(x) 


FORTRAN IV 
y = ARSIN(x) 

MAP 

CALL ARSIN(x) 


|x|^l 


13 


|x| > 1 




1! 


ARCOS 


y = arccos(x) 


FORTRAN IV 
y = ARCOS(x) 

MAP 

CALL ARCOS(x) 


Note: For a more detailed discussion of the valid argument ranges for the TNCT subroutine and how these ranges may be expanded or j 
reduced with the MTN subroutine, see Appendix A, "Algorithms and Accuracy Considerations." 



Figure 3. Single-Precision Trigonometric Subroutines 
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Subroutine 


Entry 








Valid Argument 


Error 


Options 




If Argument 


Then the 


Name 


Point 


Definition 


Calling Sequence 


Range 


Code 


Range Is 


Answer Is 


LOG 


ALOG 


y = log e (x) 


FORTRAN IV 


x > 


10 


x = 


— O 


(common 






y = ALOG(x) 










and natural 






MAP 










logarithm 
functions) 






CALL ALOG(x) 










ALOG10 


y = logio (x) 


FORTRAN IV 


11 


x<0 


log | x | 








y = ALOGlO(x) 
















MAP 
















CALL ALOGlO(x) 










SCNH 

(hyperbolic 


SINH 


y=y(e X -e- X ) 


FORTRAN IV 
y = SINH(x) 


| x | s£= 88.029692 


5 


| x | > 88.029692 





sine and cosine 
functions) 






MAP 

CALL SINH(x) 










COSH 


y = y(e* + e"") 


FORTRAN IV 
y = COSH(x) 








MAP 
















CALL COSH(x) 










TNH 

(hyperbolic 


TANH 


e x - e" x 


FORTRAN IV 
y = TANH(x) 


any argument 


not 
applicable 


not 
applicable 


not 
applicable 


e x + e~ x 


tangent) 






MAP 

CALL TANH(x) 










ERF 


ERF 


X 


FORTRAN IV 


any argument 


not 


not 


not 


(error sub- 
routine) 




2 C 2 

y= WJ e_UdU 



y = ERF(x) 

MAP 

CALL ERF(x) 




applicable 


applicable 


applicable 


GAMA 


GAMMA 


00 


FORTRAN IV 


2" 127 < x < 34.843 




2" 127 ^ x 





(gamma and 




y = ru x -\e- u )du 


y = GAMMA(x) 




20 


or 




loggamma 




MAP 






x 3s 34.843 




functions) 






CALL GAMMA(x) 










ALGAMA 


y = log 


_ oo "I 


FORTRAN IV 

y = ALGAMA(x) 


< x < 1.54926(2 120 ) 


21 


x^O 
or 











MAP 

CALL ALGAMA(x) 






x 3s 1.54926(2 X20 ) 





Figure 4. Miscellaneous Single-Precision Subroutines 
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Subroutine 
Name 


Entry 
Point 


Definition 


Calling Sequence 


Valid Argument 
Range 


Error 
Code 


Options 


If Argument 
Range Is 


Then the 
Answer Is 


FDXP 

(exponential) 


DEXP 


y = e< 


FORTRAN IV 
y = DEXP(x) 

MAP 

CALL DEXP(x) 


x ^ 88.029692 


24 


x > 88.029692 


a 


FDX1 

(exponential, 
floating-point base 
x, and fixed-point 
exponent n) 
(see Note 1) 


DXPl. 


y = x* 


FORTRAN IV 
y = x**n 

MAP 
CLA x 
LDQ x + 1 
TSL DXPl. 
PZE n 


x=^0 
nS±0 


1 


x = n = 





2 


x= 
n <0 





FDX2 

(exponential, 
double-precision 
base xi, and 
single- or double- 
precision ex- 
ponent X2) 


DXP2. 


y = xi** 


FORTRAN IV 
y = xi**X2 

MAP 
CLA xi 
LDQ xi + 1 
TSL DXP2. 

pfx X2 

(see Note 2) 


xi >0 

X2^0 


1 


xi = xs = 

or 

x a <0 





7 


xi<0 

X2 7 4 


1*1* 


FDSQ 

(square root) 


DSQRT 


y = xl 


FORTRAN IV 
y = DSQRT(x) 

MAP 

CALL DSQRT(x) 


x^=0 


28 


x <0 


|x|l 


Note 1: Subroutine FDX1 also has an entry point, CXP1., that provides an exponential function for a complex base with a fixed-point ex- 
ponent. Information on CXP1. is found in Figure 7. 

Note 2: pfx = PZE for a double-precision exponent 
pfx = MZE for a single-precision exponent 



Figure 5. Double-Precision Exponential Subroutines 
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Subroutine 


Entry 








Valid Argument 


Error 


Options 


If Argument 


Then the 


Name 


Point 




Definition 


Calling Sequence 


Range 


Code 


Range Is 


Answer Is 


FDLG 


DLOG 


y 


= log e (x) 


FORTRAN IV 


x >0 








(natural and 








y = DLOG(x) 










common logarithm 








MAP 










functions) 








CALL DLOG(x) 




25 


x = 


— 


DLOG10 


y 


= logio(x) 


FORTRAN IV 
y = DLOGlO(x) 


26 


x <0 


log | x | 










MAP 


















CALL DLOGIO(x) 










FDSC 


DSIN 


y 


= sine(x) 


FORTRAN IV 


| X | < 2 5 °7T 


27 


| X | ^ 2 60 7T 





(sine and cosine 








y = DSIN(x) 










functions, where 








MAP 










the argument x 
is expressed in 
radians) 








CALL DSIN(x) 










DCOS 


y 


= cosine (x) 


FORTRAN IV 










y = DCOS(x) 


















MAP 


















CALL DCOS(x) 










FDAT 


DATAN 


y 


zz arctan(x) 


FORTRAN IV 


any 


not 


not 


not 


(arctangent 








y = DATAN(x) 


argument 


applicable 


applicable 


applicable 


functions, where 








MAP 










the result y is 
expressed in 
radians) 








CALL DATAN (x) 










DATAN2 


y 


== arctan (-^7) 


FORTRAN IV 


(xi,Xs) ^ (0,0) 


22 


(xi,x 2 ) = (0,0) 













y = DATAN2(xi,x a ) 


















MAP 


















CALL DATAN2(x 1 ,x 2 ) 











Figure 6. Double-Precision Trigonometric and Logorithmic Subroutines 
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For the purpose of the following figure, 
z = (xi, x 2 ) = Xi + ix 2 
y = (yi, y2) = yi + iy2 



Subroutine 
Name 



Entry 
Point 



Definition 



Calling Sequence 



Valid Argument 
Range 



Error 
Code 



Options 



If Argument 
Range Is 



Then the 
Answer Is 



FCSQ 

(square root) 



CSQRT 



y = zi 
real y ^ 



FORTRAN IV 
y = CSQRT (z) 

AAAP 

CALL CSQRT(z) 



any argument 
(see Note 1) 



not 
applicable 



not 
applicable 



not 
applicable 



FCXP 

(exponential) 



CEXP 



y = e" 



FORTRAN IV 
y = CEXP(z) 

AAAP 

CALL CEXP(z) 



x, ^ 88.029692 

I x a I < 2 25 
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xi > 88.029692 



Q(cosx: -(- isinxa) 



16 



FDX1 

(exponential, 
complex base z, 
and fixed-point 
exponent n) 
(see Note 2) 



CXP1. 



FORTRAN IV 
y zr z**n 

MAP 

CLA z 

LDQ z + 1 

TSL CXP1. 

PZE n 



z^O 
n^O 



FCLG 

(natural logarithm) 



CLOG 



y = PV loge(z) 
(see Note 3) 



FORTRAN IV 
y = CLOG(z) 

MAP 

CALL CLOG(z) 



z 96 + Of 
(see Note 1) 
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FCSC 

(sine and cosine 
functions) 



CSIN 



y = sine(z) 



FORTRAN IV 

y = CSIN(z) 

MAP 

CALL CSIN(z) 



xi I < 2* 

x a I =^ 88.029692 
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CCOS 



y = cosine(z) 



FORTRAN IV 
y = CCOS(z) 

MAP 

CALL CCOS(z) 
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+ 0/ 



z = n = 



z = 
n <0 



z = + Of 



x 2 > 88.029692 



— fi + Of 



(see Note 4) 



+ 0; 

— — 0; 



FCAB 

(absolute value) 



CABS 



y= z 



FORTRAN IV 
y = CABS(z) 

MAP 

CALL CABS(z) 



any argument 
(see Note 1) 



not 
applicable 



not 
applicable 



not 
applicable 



FCA 

(arithmetic 
operations) 



FCAOP, 



Complex 
addition, 
subtraction, 
multiplication, 
and division 



(see Note 5) 



any argument 
(see Note 1) 



not 
applicable 



not 
applicable 



not 
applicable 



Note 1: Floating-point overflow can occur. 

Note 2: Subroutine FDX1 also has an entry point, DXP1., to an ex- 
ponential function for a floating-point base and a fixed-point 
exponent. (See Figure 5 for further information.) 

Note 3: The letters "PV" indicate that the "Principal Value" is re- 
turned to the user. That is, the answer given will be from 
that branch where the imaginary part lies between — rr 
and w. More specifically, — it < y 3 ^ ir unless xi < and 
X2 = — in which case y2 = — rr. 

Note 4: Optional answers for FCSC subroutine whan imaginary part 
of the complex argument is invalid: 



Note 5: Calling sequences for FCA subroutine: 



if X2 is 


result of 
CSIN(z) is 


result of 
CCOS(z) is 


> 88.029692 



— (sinxi -f- icosxi) 


— (cosxi — /sinxi) 


< —88.029692 


— (sinxi — icosxi) 


— (csoxi + isinxi) 



Arithmetic 
Operation 


FORTRAN IV 


MAP 
zi in AC-MQ 


Addition 

Z! + Z 2 


y = zi + z s 


TSL FCAOP. 

PZE z 2 


Subtraction 

Zl Z2 


y = zi — z 2 


TSL FCAOP. 
PON z a 


Multiplication 

Zl • Z 2 


y = zi*z 2 


TSL FCAOP. 
PTW zs 


Division 

Zl/Z2 


y = zi/z 2 


TSL FCAOP. 

PTH zs 



• Figure 7. Complex Subroutines 
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Subroutine 
Name 


Entry 
Point 


Definition 


Calling Sequence 


Valid Argument 
Range 


Error 
Code 


Options 


If Argument 
Range Is 


Then the 
Answer Is 


MTN 


MTAN 


Resets the ac- 
curacy guarantee 
for the single- 
precision tangent 
subroutine. 


FORTRAN IV 
CALL MTAN(k) 

MAP 

CALL MTAN(k) 


k^O 

but k must be 
an integer 
(see Note) 


not 
applicable 


not 
applicable 


not 
applicable 


FDMD 

(modular 
function) 


DMOD 


y = x (mod z) 
is computed as 

y = x — [x/z] z 
where brackets 
denote the in- 
tegral part of 
the quantity 


FORTRAN IV 
y = DMOD(x,z) 

MAP 

CALL DMOD(x,z) 


any double- 
precision 
floating-point 
numbers 


not 
applicable 


not 
applicable 


not 
applicable 


Note: A detailed description of the purpose and use of MTN is found in Appendix A, "Algorithms and Accuracy Considerations." 



Figure 8. Other Subroutines 
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Appendix A: Algorithms and Accuracy Considerations 



Introductory Information 

This section contains information on the algorithms 
and performance of most of the subroutines in the 
Fortran iv mathematical subroutine library. The sub- 
routines for which no such information is given are: 

XP1, XP2, XP3, FDX1, FDX2, and FDMD. 

Accuracy 

Because the size of a machine word is limited, small 
errors may be generated by mathematical subroutines. 
In an elaborate computation, slight inaccuracies can 
accumulate to become larger errors. Thus, in inter- 
preting final results, the user should take into account 
any errors introduced during the various intermediate 
stages. 

The accuracy of an answer returned by a subroutine 
is influenced by two factors: (1) the accuracy of the 
argument, and (2) the performance of the subroutine. 

The Accuracy of the Argument 

Most arguments contain errors. An error in a given 
argument may have accumulated over several steps 
prior to the use of the subroutine. Even data fresh 
from input conversion contain slight errors since deci- 
mal data cannot usually be exactly converted into the 
binary form required by the processing unit; the con- 
version process is usually only approximate. Argument 
errors always influence the accuracy of answers. The 
effect of an argument error on the accuracy of an an- 
swer depends solely on the nature of the mathematical 
function involved and not on the particular coding by 
which that function is computed within a subroutine. 
In order to assist users in assessing the accumulation of 
errors, a guide on the propagational effect of argument 
errors is provided for each function. Wherever possi- 
ble, this is expressed as a simple formula. 

The Performance of the Subroutines 

The performance statistics supplied in this appendix 
are based upon the assumption that arguments are 
perfect (i.e., without errors, and therefore having no 
argument error propagation effect upon answers). Thus 
the only errors in answers are those introduced by the 
subroutines themselves. 

For each subroutine, accuracy figures are given for 
one or more segments throughout the valid argument 
range(s). In each case the particular statistics given 
are those most meaningful to the function and range 



under consideration. For example, the maximum rela- 
tive error and standard deviation of the relative error 
of a set of answers are generally useful and revealing 
statistics, but useless for the range of a function where 
its value becomes 0, since the slightest error of the 
argument value can cause an unbounded fluctuation 
on the relative magnitude of the answer. Such is the 
case with sin(x) for x near tt, and in this range it is 
more appropriate to discuss absolute errors. 

Symbols Used in Describing Accuracy 

In the presentation of error statistics, the following 
symbols are employed: 



g(x) 
f(x) 



= the answer given by the subroutine for the mathe- 
matical function under discussion. 

= the correct extra precision answer for the mathe- 
matical function under discussion. 



| f(x)-g(x) 
I f(x) 



, the relative error of the answer. 



5 = the relative error of the argument. 

E = | f(x) — g(x) | , the absolute error of the answer. 

A = the absolute error of the argument. 

M(E) = Max | f(x) — g(x) | , the maximum absolute error 
produced during testing. 

I f(x) I 
produced during testing. 



M(e) 



, the maximum relative error 



a(E) 



«r(e) 



= Vi?l ,<x ' ) - 



g(Xi 



(standard deviation) absolute error. 



, the root-mean-square 



= A /1 S| f(x.)-g(x,) | 2 

V N i I f(xi) I > the root-mean-square 
(standard deviation) relative error. 



When applied to complex numbers, the absolute 
value signs in the above formulas should be regarded 
as denoting complex absolute value. Thus, the above 
formula for e represents the vector error when applied 
to a complex function. 

Algorithms 

Some of the formulas are widely known; those that are 
not widely known are derived from more commonly 
known formulas. In such cases, the steps leading from 
the common formula to the computational one have 
been detailed so that the derivation may be recon- 
structed by anyone who has a basic understanding of 
mathematics and who has access to the common texts 
on numerical analysis. Background information for 
algorithms involving continued fractions may be found 
in Wall, H.S., Analytic Theory of Continued Fractions, 
D. Van Nostrand Co., Inc., Princeton, N. J., 1948. 
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Algorithms and Performance Statistics 

Single-Precision Exponential — XPN 
Algorithm 

1. Write x[log 2 (e)] = n + r (where n is the integer 
part and r is the fraction part). Then 

e* = (2 n ) (2 r ), - 1< r < 1 

2. Compute 2 r by means of a rational approximation 
formula. This formula was derived in the following 
way: 

Take the Gaussian-type continued fraction 

z_i_ z z ?L - II. — — — 

e ~T- 1 + 2-3 + 2-5 + 2-7 + 2-..., 

truncate at the ninth term and rewrite it to obtain 

■ ^ 1680 + 840z + 180z 2 + 20z 3 + z* 
e ~ 1680 - 840z + 180z 2 - 20z 3 + z 4 

Substituting r[log e (2)] for z and rewriting the above, 
we get: 



2' ££ 1 + 



2r 



Cr ! 



-' + d -(?Ta) 



k i* + A, 

where A, B, C, and D are constants. 
The maximum relative error of this formula is 
1.6 X 10- 9 . 

3. If x < -89.415987, then is given as the answer. 

4. Computation is carried out in fixed-point to mini- 
mize truncation errors. 

Effect of an Argument Error 

e ~ A. Since A = 8x, for the larger value of x, even the 
round-off error of the argument causes a substantial 
relative error in the answer. 

Performance Statistics 



Argument 
Range 


Root-Mean-Square 

Relative Error 

ff(e) 


Maximum 
Relative Error 

M(e) 


Average 

Speed in 

Microseconds 


7040 


7044 


O^x ^ 1 


3.37 X 10- 9 


7.32 X 10~ 9 


859 


339 


| x | ^ 88.028 


4.80 X 10~ 9 


1.50 X 10" 8 


850 


334 


Note: The sample arguments upon which the above statistics are 
based were uniformly distributed over the specified range. 



Figure 9. XPN, Exponential Function 
Single-Precision Square Root — SQR 

Algorithm 

1. Write x = [2 2p_(J ] (m), where p is an integer, 
q = or 1, and 1/2 ^m<l. Then, V* = 2p V2-«(m) 
for 1/4 ^ [2- ( i(m)] < 1. 



2. The first approximation y of Vx for 1/4 — x < 1 
is obtained by one of the four hyperbolic fits of the 
form y = a + b/(c + x) depending on the size of x. 
If 2~ 1/2 — x<l, then the constants a, b, and c are 
chosen to minimize the relative error of the fit while 
giving the exact value for x = 1. The maximum relative 
error of this fit is 0.5 X 10~ 4 , and its contribution to the 
final relative error is less than 0.13 X 10~ 8 . 



If 1/4 



< 2~ 3/2 , or 2- 3 / 2 ^ x < 1/2, or 



1/2 — x<2 -1/2 , the constants a, b, and c are chosen 
to minimize the contribution to the absolute error of 
the answer. In these cases, the contribution to the final 
absolute error is less than 0.5 X 10 -9 , or 0.6 X 10~ 9 , or 
0.7 X 10~° depending on the range. 

3. Apply the Newton-Raphson iteration once to get 
the answer: 



yi =K yo+ f) 



Effect of an Argument Error 
Performance Statistics 



Argument 
Range 


Root-Mean-Square 
Relative Error 

<r(e) 


Maximum 
Relative Error 

M(e) 


Average 

Speed in 

Microseconds 


7040 


7044 


All positive 
numbers 


3.32 X 10° 


7.54 X 10" 9 


688 


243 


Note: The sample arguments upon which the above statistics are 
based were exponentially distributed over the specified range. 



Figure 10. SQR, Square Root Function 



Single-Precision Sine/Cosine — SCN 

Algorithm 

1. sin (— x) = — sin (x), cos (— x) = cos (x); assume 
x^0. 

2. Write x ~-j- (q) + r, where q is an integer, and 

^ r <— • Let Qo - q [mod 8] . 
4 

3. If cos (x) is desired, raise q by 2, reduce it modulo 
8 and compute sine. If sin (x) is desired and if x < 2 -13 , 
give x as the answer. 

4. Now the case is reduced to the computation of 

sin (-j- q + r), — q — 7. Using the formulas: 
sin [^-(4 + q„) + r] = - sin [-J(qo) + r] ^ q„ ^ 3 



16 



sta[«-+r]=oos[f -r] 

sin ^ y + r J = cos (r) 

sin[-^+r]=sin[-|— r] 
the case is reduced to the computation of sin (r) or 
cos (r) for — r ——r- 

5. The coefficients of the approximation 

sin (r) = r (s + s^ 2 + s 2 r 4 + s 3 r 6 ) were obtained by the 

IT 

Chebyshev interpolation over the range — r — -^ . 

The coefficients of the approximation 
cos (r) = 1 + c x r 2 + c 2 r 4 + c 3 r 6 + c 4 r 8 were obtained by 
the Chebyshev interpolation over the range 

-0.01 ^ r ^ 4" 

4 

The relative error of the sine formula is less than 
0.34 X 10- 8 . 

The relative error of the cosine formula is less than 
0.73 X 10- 10 . 

6. Computations are carried out in fixed-point to 
minimize truncation errors. 

Effecf of an Argument Error 

E ~ A. As the argument gets larger, A grows, and since 
the value of the function is periodically diminishing, no 
consistent relative error control can be maintained out- 
side the principal range off — -^-> — Y This holds true 
for cosine as well. 
Performance Statistics 



Argument 
Range 


Root-Mean-Square 

Relative Error 

ff(e) 


Maximum 
Relative Error 

M(e) 


Average 

Speed in 

Microseconds 


7040 


7044 


\*\^f 


4.82 X 10~ 9 


1.64 x icr 8 


1063 


437 



Figure 11. SCN, Sine Function (I) 



Argument 
Range 


Root-Mean-Square 

Absolute Error 

<r(E) 


Maximum 

Absolute Error 

M(E) 


Average 

Speed in 

Microseconds 


7040 


7044 


1 X 1 - 2 


0.90 X 2" 28 


1.20 X 2" 27 


1063 


437 


-< 1x1=^10 
2 ' ' 


1.10 X 2" 28 


1.84 X 2" 27 


1150 


475 


10<|x[^100 


1.11 X 2~ 28 


1.90 X 2" 27 


1154 


482 



Argument 
Range 


Root-Mean-Square 

Absolute Error 

«r(E) 


Maximum 

Absolute Error 

M(E) 


Average 

Speed in 

Microseconds 


7040 


7044 


— X — 7T 


1.14 X 2- :!S 


1.85 X 2" 27 


1236 


501 


— 20 ^ x < 
TT < x ^ 20 


1.11 X 2" 28 


1.88 X 2" 27 


1234 


500 



Figure 13. SCN, Cosine Function 

Note: The sample arguments upon which the statistics in Figures 
11, 12, and 13 are based were uniformly distributed over the speci- 
fied ranges. 

Single-Precision Tangent/Cotangent — TNCT 
Algorithm 

I 1. If x <0, use tan (x) = — tan (|x|), 

cot(x) = — cot(|x|). Assume x — now. 

2. Write x =-j-(q) + r, where q is an integer and 

^ r <-j- . Let q = q [mod 4]. 

3. If q = or 2 (i.e., octant 1 and 3), define r = r. 
If q = 1 or 3 (i.e., octant 2 and 4), define r = | — r. 

4. Define the case number s as follows: 
If tan (x) is desired, s = q . 

If cotan (x) is desired, s = 1 f or q = 0, or s = for 
q = 1, or s = 3 for q = 2, or s = 2 for q = 3. 

5. Compute the factor F as follows: 

13.946 r 2 - 313.11 



F=l+ . 

r 2 - 104.46 + (■ 

If r ^ 2" 14 , then F = 1. 



939.33 

ro 2 



if r > 2-". 



This approximation can be obtained by rewriting the 
continued fraction: 



tan (r ) 



Figure 12. SCN, Sine Function (II) 



r -1-3-5-7- 8.946 

The maximum relative error of this formula is 10 9 . 
6. Now the answer is r /F for s = 0, F/r for s = 1, 
- F/r for s = 2, - r /F for s = 3. 

Relative Error Control 

Let x = (2 n ) m. If the case number s above is 1 or 2, 
and if the reduced argument r is less than 2 -26+n (with 
the exception of a cotangent entry with small argu- 
ments), an execution error is signalled. This is the case 
when the argument is so close to a singularity that the 
minimal indeterminate value of the argument (caused 
by pre-rounding) can cause a relative error of up to 
1/3. No screening is given for arguments near a zero 
of the function. 
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The foregoing can be strengthened or eliminated by 
the use of the mtn subroutine. 

An execution error is also signalled if |x| — 2 20 , or if 
the cotangent function is requested with |x| < 2~ 126 . 

Effect of an Argument Error 

E ~ A/cos 2 (x), e ~ 2A/sin (2x) for tan (x). Thus, near 
the singularities x = (k + 1/2) it, where k is an integer, 
neither absolute error control nor relative error con- 
trol can be maintained. Similarly, this is true for 
cotan(x), where x = k tr, and k is an integer. 

Performance Statistics 



Argument 
Range 


Root-Mean-Square 

Relative Error 

<r(e) 


Maximum 
Relative Error 

M(e) 


Average 

Speed in 

Microseconds 


7040 


7044 


1 1 -= 2L 
1*1- 4 


4.40 X 10" fl 


1.41 X 10~ 8 


1106 


458 


JL < | x 1 ^ JL. 
4^'l 2 


5.29 X 10~ 9 


(6.46 X 10~ 3 )* 


1386 


557 


f< |x|^10 


9.35 x 10" 9 


(6.09 X 10~V 


1307 


532 


10<|x =£=100 


9.70 X 10" B 


(9.97 X 10-*)* 


1311 


533 



Figure 14. TNCT. Tangent Function 



Argument 
Range 


Root-Mean-Square 
Relative Error 

<r(e) 


Maximum 

Relative Error 

M(e) 


Average 

Speed in 

Microseconds 


7040 


7044 


1*1^7 


4.34 X 10 -9 


1.34 X 10~ 8 


1234 


498 


~<\A^H~ 


1.05 X 10 -8 


(8.64 X 10~ 8 )* 


1345 


537 


^<|x|-10 


9.08 X 10-" 


(2.41 X 10" 8 )* 


1406 


559 



Figure 15. TNCT, Cotangent Function 



*The figures cited as the maximum relative errors are those en- 
countered among 2500 random samples in the respective ranges. 
For all the perfect arguments in the full legal range (under the 
standard error control), the maximum relative error is estimated to 
be 2 X 10-". 

Note: The sample arguments upon which the statistics in Figures 

14 and 15 are based were uniformly distributed over the specified 

ranges. 



Single-Precision Arctangent — ATN 
Algorithm 

1. Assume — x — 1, 

arctan (— x) = — arctan (x); 

arctan( r !v) =-^- — arctan (Ixl) 
|x| 7 2 ' ' 

2. If [~tan(15 )~] — x — 1, reduce further to the range 
|x"| ^ |"tan(15°)l byarctan(x) = 30° + arctan(x), where 

_ A 

X = V3- 



x + V3 

3. By transforming the Taylor series into a con- 
tinued fraction, we obtain 



arctan(x) = x 



3 + 5 



(4)(5) 
(7)(7)(9) 



43 



7 + X " (7)(11) 



+ x~ 2 -f- w 



(where w is an abbreviation for further items) 
Dropping w and rewriting the formula, we get 



arctan(x) = x 



64 x a 



+ ™ + 



(3)(5 2 )(7 2 ) ^ (5 3 )(7 2 ) 



(3 B )(i3)(79) 



(5 4 )(7 3 ) 



x 2 + 



34063 



(3)(5)(13)(79) 



[■ 



(14641)(1100) -i 
(3 2 )(7)(13 2 )(79 a ) 



x 2 + 



(H)(389) 



(3)(13)(79)-l 

The maximum relative error of this approximation 
for |x| ^ [tan(15 )] is 6 X 10" 11 . 

4. Fixed-point computation is used to minimize 
truncation errors. 

5. atan2 provides the extended answer range 

— 71- — y — 7r, depending on the combination of signs 
of the two arguments. 

Effect of an Argument Error 

E — . For small x, e ~ 8; and as x becomes large, 

1 + x 2 

the effect of 8 on e diminishes. 
Performance Statistics 



Argument 
Range 


Root-Mean-Square 
Relative Error 

<r(e) 


Maximum 

Relative Error 

M(e) 


Average 

Speed in 

Microseconds 


7040 


7044 


The Entire Range 


3.56 X 10- 9 


1.41 X 10~ 8 


1293 


516 


Note: The sample arguments upon which the above statistics are 
based were tangents of uniformly distributed numbers between 

— ~~Z an d — . (That is, the sample was such that the values re- 
turned by the function were uniformly distributed over the answer 
range.) 



Figure 16. ATN, Arctangent Function 
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Single-Precision Arcsine/Arccosine — ARSCN 
Algorithm 

1. If — x — 1/2, compute arcsin(x) by the use of 
the Chebyshev interpolation polynomial of degree 5 
over this range. 

2. If !<x^l, 



arcsin 



(x)=-£-2f 






arcsin 



(W)] 



In this range we have ^ A/— ~- < 1/2, which re- 
duces the case to that of item 1 above. 

3. If — x ^ 1, arccos(x) = |- - arcsin(x). This re- 
duces the case of arccosine to that of arcsine. 

4. If — 1 — x < 0, use arcsin(x) = — arcsin(— x), and 
arccos(x) = <* — arccos(— x) to reduce to the earlier 
cases. 

5. The sqr subroutine is used in step 2, above. 

Effecf of an Argument Error 

A 



E lr=f ■ Thus, for small x, E ~ A. For arsin with 

VI — x 2 

a small x, e~8. Toward the limit of the range, a small 

argument error causes a substantial error in the answer. 



Performance Statistics 



Argument 
Range 


Root-Mean-Square 
Relative Error 

<7(e) 


Maximum 
Relative Error 

M(«=) 


Average 

Speed in 

Microseconds 


7040 


7044 


|x|-<I 


5.39 X 10' fl 


3.15 X TO" 8 


1335 


551 



Figure 17. ARSCN, Arcsine Function 



Argument 
Range 


Root-Mean-Square 

Relative Error 

<r(e) 


Maximum 
Relative Error 

M(e) 


Average 

Speed in 

Microseconds 


7040 


7044 


|x | ^ 1 


5.60 X 10" 9 


1.93 X 10~ 8 


1421 


575 



Figure 18. ARSCN, Arccosine Function 

Note: The sample arguments upon which the statistics in Figures 
17 and 18 are based were uniformly distributed over the specified 
range. 



Single-Precision Logarithm — LOG 

Algorithm 

1. If |1 — x| <2 -7 , use the polynomial approxima- 
tion: 



log(l + z) - z - [2- 1 + 3(2" 18 )] z 2 + | z 3 , 

where z = x — 1. 
The maximum relative error of this formula for 
|z|<2~ 7 , is 3X10~ 8 . 

2. If |1 — x| — 2 -7 , reduce the case as follows: 

write x = 2 p (m), where — ^ m < 1; 



V2 



m + 



V2 



Also 



1 + z 



l0ge(x) 



, then | z | < 0.1716. 

m \/2 and 
+ log 2 



=[>-!+- (1^)] 



l0ge(2). 



3. By transforming the Taylor series into a continued 
fraction, we obtain 



log e (lif) =2z 



1+ T + 



Y+ z- 2 + w(z)_ 



Whenz 



the approximate value of the 

( 7 ) jc _ . Thus, we ob- 

{) L (7)(9)(140)J 



(3)(3371)(z 2 )-(1019)(z 4 ) 



(2z) (*) 



J (7)(H) 
\ 2843 

remainder term w 

tain the approximation: 

w / l + z \ (3 3 )(4)(5)(7 

10ge \l-zj- (3 3 )(4)(5)(7 2 ) - (3)(6311)(z 2 ) 

This formula reduces to the form 

log 2 (frf) s z [ci + c 2 z 2 + c 3 /(z 2 + c 4 )l. 

The maximum relative error of (*) is 0.62 X 10~ 9 
for |z| < 0.1716. However, since the procedure in item 
2 may involve the cancellation of significant digits, this 
relative accuracy can not be maintained for the final re- 
sult if the argument value is near 1. 

Effect of an Argument Error 

E ~ 8. In particular, if 8 is the round-off error of the 
argument, say S ~ 7 X 10~ 9 , then E ~ 7 X 10~ 9 . This 
means that if the argument is close to 1, the relative 
error can be very large, since the value of the function 
at that point is very small. 

Performance Statistics 



Argument 
Range 


Root-Mean-Square 

Absolute Error 

ff(E) 


Maximum 
Absolute Error 

M(E) 


Average 

Speed in 

Microseconds 


7040 


7044 


15 ^ ^i u 
~16 X _ "i6~ 


1.17 X 2" a ' 


1.44 X 2~ 81 


1024 


419 


Note: The sample arguments upon which the above statistics are 
based were uniformly distributed over the specified range. 



Figure 19. LOG, Natural Logarithm Function (I) 
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Argument 
Range 


Root-Mean-Square 

Relative Error 

<r(e) 


Maximum 
Relative Error 

M(e) 


Average 

Speed in 

Microseconds 


7040 


7044 


All positive 
numbers outside 

\ 16 16/ 


3.39 X lO' 9 


8.95 X 10 -9 


1104 


452 


Note: The sample arguments upon which the above statistics are 
based were exponentially distributed over the specified range. 



Figure 20. LOG, Natural Logarithm Function (II) 



Argument 
Range 


Root-Mean-Square 

Absolute Error 

<r(E) 


Maximum 
Absolute Error 

M(E) 


Average 

Speed in 

Microseconds 


7040 


7044 


15 ^ ^ 17 

16 — x ~ 16 


1.95 X 2" M 


1.14 X 2" 31 


1151 


471 


Note: The sample arguments upon which the above statistics are 
based were uniformly distributed over the specified range. 



Figure 21. LOG, Common Logarithm Function (I) 



Argument 
Range 


Root-Mean-Square 

Relative Error 

<r(e) 


Maximum 
Relative Error 

M(e) 


Average 

Speed in 

Microseconds 


7040 


7044 


All positive 
numbers outside 

\ 16 ^6/ 


4.84 X 10" 9 


1.75 X 10 s 


1231 


504 


Note: The sample arguments upon which the above statistics are 
based were exponentially distributed over the specified range. 



• Figure 22. LOG, Common Logarithm Function (II) 



Single-Precision Hyperbolic Sine/Hyperbolic Cosine 
—SCNH 



Algorithm 



1. cosh(x) = 



e x + e~ 



2. If |x| > 0.3465736, use sinh(x) = ^ 

3. If |x| ^ 0.3465736, use sinh(x) - x + ^r + lf] + yT 
The maximum relative error of this approximation is 
6 X 10- 10 . 

4. This subroutine uses the exponential subroutine 

XPN. 



Effect of an Argument Error 

For the sinh function: 

A 2 A 2 

E ~ A [cosh(x)] + — [sinh(x)] and e~ A [coth(x)] + —^- 

For the cosh function: 

A 2 A 2 

E ~ A [sinh(x)] + -r- [cosh(x)] and e ~ A [tanh(x)] + — 

In particular, for cosh, e ~ A over the entire range. 
However, for sinh given a small value for x, e~ 8. 

Performance Statistics 



Argument 
Range 


Root-Mean-Square 
Relative Error 

<r(e) 


Maximum 
Relative Error 

M(e) 


Average 

Speed in 

Microseconds 


7040 


7044 


| x | ±£0.3466 


5.78 X lO' 9 


1.45 X 10 -8 


731 


288 


0.3466<[x|^10 


7.10 X 10 _e 


2.61 X 10 -8 


1427 


534 



Figure 23. SCNH, Hyperbolic Sine Function 



Argument 
Range 


Root-Mean-Square 
Relative Error 

<r(e) 


Maximum 
Relative Error 

M(e) 


Average 

Speed in 

Microseconds 


7040 


7044 


|x| ^ 10 


7.26 X 10" 9 


2.22 X 10" 8 


1314 


500 



Figure 24. SCNH, Hyperbolic Cosine Function 

Note: The sample arguments upon which the above statistics in 
Figures 23 and 24 are based were uniformly distributed over the 
specified range. 



Single-Precision Hyperbolic Tangent — TNH 
Algorithm 

1. For |x| < 0.5493, use a modified continued fraction 

x x 2 x 2 x 2 x 2 

tanh(x) = T+ "3- + -g- + Y + 902743 

The maximum relative error of this approximation is 
4 X 10- 9 . 

2. For 0.5493 ^ x < 10.4, use tanh(x) = 1 - 



(e 2x ) + 1 

3. The xpn subroutine is used in step 2, above. 

4. For x — 10.4, give tanh(x) = 1. 

5. For x ^ - 0.5493, use tanh(- x) = - tanh(x). 



Effect of an Argument Error 



E~[l-tanh 2 (x)]A,e' 



2A 



. , ,_ v . Thus, for small values 
sinh(2x) 

of x, e ~ 8, and x gets larger, the effect of 8 on e 
diminishes. 



20 



Performance Statistics 



Argument 
Range 


Root-Mean-Square 
Relative Error 

<r(e) 


Maximum 
Relative Error 

M(e) 


Average 

Speed in 

Microseconds 


7040 


7044 


| x [±£0.5493 


5.70 X 10- 9 


1.44 X 10~ 8 


993 


401 


0.5493<|x |^10.4 


4.28 X TO" 9 


1.09 X 10" 8 


1307 


495 


Note: The sample arguments upon which the above statistics are 
based were uniformly distributed over the specified range. 



Figure 25. TNH, Hyperbolic Tangent Function 

Single-Precision Error Function Subroutine — ERF 
Algorithm 

1. erf(— x) = — erf(x). Assume x — now. 

2. Ifx>4.17,erf(x)^l. 

3. If 4.17 — x > 1.51, use the following Gaussian- 
type continued fraction: 

00 



l-erf(x)= -== JV u "du 
V*-l_x J 

re 

/• 2 2 F 0-5x 

J e- u du = e- x j^ x 2 + 



where 



0.5 



5 - 



3.0 



- x 2 + 4.5 - 



r o.5x 

|_ x 2 + 0.1 



x 2 + 2.5 

(n- %)n 
x 2 + 2n + % 
0.5 



-...] 



3.0 



x 2 + 2.5 - x 2 + 4.5 



7.5 



10.803 ~[ 
•■ + 4.269 J 



- x 2 + 6.5 - x* + 

The maximum absolute error of this approximation 
is 1.1 X 10- 9 . 

The two constants in the last term are obtained by 
requiring that the formula give the exact values at 
x = V2J3T25 and x = V2/75. 

4. If 1.51 — x — 0, use the continued fraction ob- 
tained by transforming the Taylor expansion of erf: 



= 1 



+ x 2 + 5.578 + w(x) 
(Constants given in the second formula are approxi- 
mate.) 

Drop w(x) and modify the two constants of the last 

term so that the formula is exact at x = 1 and x = V2. 
The maximum relative error of the formula obtained 
in this way is 2 X 10~ 9 . 

5. Fixed-point computation is employed to minimize 
truncation errors. This subroutine uses the exponential 
subroutine xpn. 



3 + 2!5 
1.0281x 2 

x 2 + 10.216 
201.39 


+ 
+ 


3!7 + 4!9 
- 167.17 
x 2 + 9.8103 
31.228 


+ x 2 + 11.570 
64.244 


x 2 - 1.7730 



Effect of an Argument Error 

E ~ A (e- x2 ). As the magnitude of the argument in- 
creases from 1, the effect of an argument error on the 
final accuracy diminishes rapidly. For small x, e ~ 8. 



Performance Statistics 



Argument 
Range 


Root-Mean-Square 

Relative Error 

<r(e) 


Maximum 

Relative Error 

M(e) 


Average 

Speed in 

Microseconds 


7040 


7044 


[x|-4 


4.75 X 10' 9 


1.95 X TO" 8 


1972 


792 


Note: The sample arguments upon which the above statistics are 
based were uniformly distributed over the specified range. 



Figure 26. ERF, Error Function 

Single-Precision Gamma/Loggamma — GAMA 

Algorithm 

1. If 0<x^2- 127 , use log[r(x)] = -log(x) for 

ALGAMA. 

2. If 2 -127 <x<4 for gamma, reduce the case to 
1 -s x ^ 2, using x [r(x)] = r(x + 1). Compute T(x) 
for 1^ x ^ 2, using a continued fraction obtained in the 
following manner: 



OO c 

T(x) = / e" 4 1*- 1 dt = X 







k:=0 



k! 



(x - 1.5) k (*) 



where 

OO 

a k = / (log t) k e" t2 t 2 dt 

and 

a = 1 V"" 

ai = 0.80845993 X 10" 2 



an = -0.1628 X 10 10 
Then, by transforming (*) into a continued fraction, 
we obtain, 



r(z + 1.5) = 



where 



z + /3i + z + /9 S + z + /3s 



+ z + ^4 +Z + /36 + w(z) 



(**) 



ai = - 1.2581927 
a 2 = 33.814358 
m = 59.285853 
cu = — 3.6512651 
a B = 6.0985782 



/3i = -11.746649 
/Si = — 4.8326355 
/3 S = 8.1171874 
pi = 0.20317850 
p 5 = 1.4063097 



Finally, drop w(z) in (**), and compensate for it by 
modifying the constants /3 4 , «5, and /? 5 to obtain an 
approximation formula accurate to within an absolute 
error of 2.3 X 10~ 9 . 

3. If 2~ 127 < x < 4 for algama, compute log [r(x)] 
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by first computing r(x) as in item 2, and then taking 
the logarithm of the result. 

4. If 4 ^ x < 1.54926(2 120 ) for algama, compute 
log [r(x)j as follows: 

log [r(x)] « x [log(x) - 1] - i log(x) + i log(27r) + F(x) 
where 



F(x) 
F(x) 



Oif 
1 



+ 



if x < 2 1 



12x 360x 3 ' 1260x B 1760x 7 
This formula is the result of economizing Stirling's 
asymptotoc series. For the range considered, its abso- 
lute error is less than 2.1 X 10 -9 . 

5. If 4 ^ x < 34.843 for gamma, compute r(x) by 
first computing log[r(x)] as in item 4, and then tak- 
ing its exponential base e. 

6. Subroutines log and xpn are used by this sub- 
routine. 



Effect of an Argument Error 

For r(x), e~ |>(x)]A, and for log[r(x)], E~ [*(x)]A, 
where * is the digamma function. 

For l/2<x<3, -2<#(x)<l, and E~A for 
log [r(x)]. However, since x = l and x = 2 are zeros 
of log [r(x)], even a small 8 can cause a substantial 
c in this interval. 

For large values of x, *(x) ~ log(x). Hence, for r(x), 
€~8 [xlog(x)]. This shows that even the round-off 
error of the argument contributes greatly to the relative 
error of the answer. However, for log [r(x)] with large 
values of x, c ~ 8. 



Performance Statistics 



Argument 
Range 


Root-Mean-Square 
Relative Error 


Maximum 
Relative Error 

M(e) 


Average 

Speed in 

Microseconds 


7040 


7044 


2" 127 < x < 1 


4.38 X 10-° 


1.26 X 10 s 


1542 


596 


1 ^x <2 


5.32 x icr B 


1.06 X 10 s 


1446 


552 


2^x <4 


6.88 X 10- 9 


2.84 X 10" 8 


1617 


632 


4=^x < 10 


9.00 X 10" 8 


3.31 X 10" 7 


3175 


1308 


10^x <34 


5.88 X 10" 7 


2.02 X 10~ 6 


3178 


1310 



Argument 
Range 


Root-Mean-Square 

Relative Error 

<r(e) 


Maximum 

Relative Error 

M(e) 


Average 

Speed in 

Microseconds 


7040 


7044 


< x s£ 1/2 


5.18 X 10" 9 


2.14 X 10" 8 


2620 


1039 


3 =£=x <4 


8.45 X 10" 9 


3.47 X 10 -8 


2761 


1102 


4=£=x < 10 


1.08 X 10~ 8 


3.61 X 10 -8 


2221 


943 


10^x < 100 


1.30 X 10 -8 


3.35 X 10" 8 


2228 


950 



Figure 28. GAMA, Loggamma Function (I) 



Argument 
Range 


Root-Mean-Square 
Absolute Error 

<r(E) 


Maximum 
Absolute Error 

M(E) 


Average 

Speed in 

Microseconds 


7040 


7044 


1/2 < x < 3 


1.28 X 2" 28 


1.94 X 2 -27 


2574 


1020 



Figure 27. GAMA, Gamma Function 
22 



Figure 29. GAMA, Loggamma Function (II) 

Note: The sample arguments upon which the statistics in Figures 
27, 28, and 29 are based were uniformly distributed over the speci- 
fied ranges. 

Double-Precision Exponential — FDXP 
Algorithm 

1. e x = 2 y , where y = x log 2 (e)l . 

Write y = y 1 + y 2 , where y x is the integer part and 
y 2 is the fraction part. 

If y — 0, then set z x = y l5 and z 2 = y 2 . 

If y < 0, then set x x = y x — 1, and z 2 = y 2 + 1. 

Then, 2 y = (2 z i)(2 z 2), where z x is an integer and 
^ z 2 ^ 1. 

2. For 0^z 2 ^l, 2 Z 2 is computed by use of the 
Chebyshev interpolation polynomial of degree 11 for 
the interval. The maximum relative error of this poly- 
nomial is 2~ 57 . 

3. If x ^ -89.415987, is given as the answer. 

Effect of an Argument Error 

e ~ A. Since A = 8x, for the larger value of x even the 
round-off error of the argument causes a substantial 
relative error in the answer. 

Performance Statistics 

Most of the double-precision subroutine perform- 
ance figures have two sets of statistics for each argu- 
ment range. Those statistics preceded by an "S" were 
calculated on machines not having the double-preci- 
sion instruction set; those preceded by a "D" were cal- 
culated on machines having the double-precision in- 
struction set. 



Argument 




Root-Mean-Square 
Relative Error 


Maximum 
Relative Error 

M(e) 


Average 

Speed in 

Microseconds 


Range 


7040 


7044 


£== X ^= 1 


S 


4.24 X 10"" 


1.68 X 10' ia 


21008 


. — 


D 


3.74 X 10- 17 


1.07 X lO" 16 


2619 


1289 


| x | =s£ 88.028 


S 


4.82 X TO" 17 


1.96 X lO" 19 


21018 


— 


D 


4.38 X 10~ 17 


1.39 X 10" 19 


2633 


1293 


Note: The sample arguments upon which the above statistics are 
based were uniformly distributed over the specified range. 



Figure 30. FDXP, Exponential Function (Double-Precision) 

Double-Precision Square Roof — FDSQ 
Algorithm 

1. Write x = 2 2 e-i(m), where p is an integer, q = 
or 1, and— ^m <1. Then 



Vx =2"V2- , »(m) 

2. Take the first approximation y to be 

y = 2P (f+i)' ifq = 



or 



y = 2>(f+^),ifq = l 



The relative error of y is less than 2 -4 . 

3. Apply the Newton-Raphson iteration four times 
to y (three times in single-precision and the fourth 
time in double-precision). 

yn+i= i( yn + 7„) 

The maximum relative error of y 3 is 2 -53 . 
Effecf of an Argument Error 

Performance Statistics 



Argument 
Range 


Root-Mean-Square 
Relative Error 

<r(e) 


Maximum 
Relative Error 

M(e) 


Average 

Speed in 

Microseconds 


7040 


7044 


10- 80 ^x< 10 87 


4.26 X 10" 17 


1.59 X lO" 19 


1346 


496 


Note: The sample arguments upon which the above statistics are 
based were exponentially distributed over the specified range. 



Figure 31. FDSQ, Square Root Function (Double-Precision) 



Double-Precision Logarithm — FDLG 
Algorithm 

1. Write x = 2 n (m), wherey^m <1. Define the base 



— value Fas: 



F = l,if^|-^ m <l 
m-F 



Let z = m . Then, m = Fx( ) and I z I ^0.1716. 

m + 1 \1 — z/ ii 

log(x) = n [log(2)] + log(F) + log (f^f ) 
log(F) = -log(2) for^ m <^| 
log(F) = for^pss m < 1 

4 log(l±£)= &<! + *+ *+...) 

= z (c + Ci z 2 + . . . + c 7 z 14 ) 
where the coeflBcients e , Ci, . . . , c 7 are obtained by 

the Chebyschev interpolation. 

The maximum relative error of this approximation 

is 2~ 59 - 9 . 



Effecf of an Argument Error 

E ~ 8. In particular, if 8 is the round-off error of the 
argument, say 8 ~ 5.6 X 10- 17 , then E - 5.6 X 10~ 17 . 
This means that if the argument is close to 1, the rela- 
tive error can be very large, since the function value 
is very small at that point, 



Performance Statistics 



Argument 
Range 


Root-Mean-Square 

Absolute Error 

<r(E) 


Maximum 
Absolute Error 

M(E) 


Average 

Speed in 

Microseconds 


7040 


7044 


1 ^ x ^2 
2 


S 


1.3 X 2- M 


1.64 X 2" 158 


20801 


— 


D 


1.76 X 2 "*• 


1.89 X 2 M 


2278 


1102 


Note: The sarr 
based were un 


pie arguments upon which the above statistics are 
iformly distributed over the specified range. 



Figure 32. FDLG, Natural Logarithm Function (I) 
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Root-Mean-Square 

Relative Error 

<r(e) 


Maximum 
Relative Error 

M(e) 


Average 

Speed in 

Microseconds 


Range 


7040 


7044 


Full Range 
excluding 

(i-0 


S 


9.65 X 10~" 


2.95 X lO"' 8 


20827 


— 


D 


6.09 X TO" 17 


1.80 X 10" 16 


2307 


1123 


Note: The sample arguments upon which the above statistics are 
based were exponentially distributed over the specified range. 



Figure 33. FDLG, Natural Logarithm Function (II) 



Argument 




Root-Mean-Square 

Absolute Error 

<r(E) 


Maximum 
Absolute Error 

M(E) 


Average 

Speed in 

Microseconds 


Range 


7040 


7044 


i — > 


S 


1.85 X 2" B8 


1.12 X 2 _53 


22340 


— 


D 


1.66 X 2"* 1 


1.64 X 2' 6i 


2446 


1182 


Note: The sample 
based were unifo 


arguments upon \ 
rmly distributed ove 


vhich the above statistics are 
r the specified range. 



Figure 34. FDLG, Common Logarithm Function (I) 



Argument 




Root-Mean-Square 

Relative Error 

<r(e) 


Maximum 
Relative Error 

M(e) 


Average 

Speed in 

Microseconds 


Range 


7040 


7044 


Full range 
excluding 

(i-0 


S 


1.18 X TO" 16 


4.45 X lO" 16 


22370 


— 


D 


9.10 X 10"" 


3.31 X 10" 18 


2475 


1203 


Note: The sample arguments upon which the above statistics are 
based were exponentially distributed over the specified range. 



• Figure 35. FDLG, Common Logarithm Function (II) 

Double-Precision Sine/Cosine — FDSC 

Algorithm 

1. If cos(x) is desired, reduce the case to a sine func- 
tion by 

cos(x) = sin ( -^r — x J 

2. If x < 0, use sin(— x) = — sin(x). Assume x — 0. If 
j x | < 2~ 26 , give x as the answer. 

24 



3. Divide x by-jand separate the integer part, qi, 

and the fraction part, q 2 , of the quotient. Let 
q = qi[mod 8]. Then 

sin(x) = sin I -^-qo + ^q? ) 

4. Further reduce the case to the computation of 
either sin(jr) or cos(^r) (where O^r^l) by the 



formulas : 



^ q < 3 



sin [-J- (4 + q ) + -J- q 2 ] = -sin (-J q„ + -J q 2 ) 

«dn(f +f q 2 )=cos[f (1-q*)] 
sin (|- + f q, ) = cos (^ q. ) 

sin (ir +i q =^[i- (1 - q2) ] 

5. ForO^r^l, 

sin (-j r ) = S r + Sir 3 + S 2 r 5 + • • • + Sor 13 
cos (-f r ) = 1 + Cir 2 + C 2 r* + ■ ■ ■ + Cr 12 

where the coefficients S , . . . , S 6 are obtained by the 
Chebyschev interpolation over the range — r ^ 1, and 
the coefficients C , . . . , C 6 are obtained by the Cheby- 
shev interpolation over the range — 0.01 — r^l. 

The maximum relative error of the sine approxima- 
tion is 2~ 58 . The maximum relative error of the cosine 
approximation is 2 -54 . 



Effect of an Argument Error 

E — A. As the argument gets larger, A grows, and since 
the value of the function is periodically diminishing, no 
consistent relative error can be maintained outside 

the principal range ( - -|-» "J) • Tnis holds true for co- 
sine as well. 



Performance Statistics 



Argument 




Root-Mean-Square 

Relative Error 

<r(e) 


Maximum 
Relative Error 

M(e) 


Average 

Speed in 

Microseconds 


Range 


7040 


7044 


l«Mf 


S 


1.66 X 10 -18 


7.83 X 10~ 16 


14989 


— 


D 


9.08 X 10'" 


3.81 X 10 -18 


1974 


943 



Figure 36. FDSC, Double-Precision Sine Function (I) 



Argument 




Root-Mean-Square 

Absolute Error 

<r(E) 


Maximum 
Absolute Error 

M(E) 


Average 

Speed in 

Microseconds 


Range 


7040 


7044 


M^f 


s 


1.26 X 2- Si 


1.26 X 2^ 2 


14989 


— 


D 


1.12 X 2-" 


1.00 X 2" 52 


1974 


943 


_|<|x|-10 


S 


1.25 X 2- 81 


1.46 X 2" 4B 


15014 


— 


D 


1.16 X 2- M 


1.32 X 2- 60 


2086 


988 


10 < | x | ^ 100 


S 


1.27 X 2 -48 


1.07 X 2" 4B 


15055 


— 


D 


1.08 X 2~ 49 


1.21 X 2-" 


2085 


986 



Figure 37. FDSC, Double-Precision Sine Function (II) 



Argument 




Root-Mean-Square 

Absolute Error 

<r(E) 


Maximum 
Absolute Error 

M(E) 


Average 

Speed in 

Microseconds 


Range 


7040 


7044 


— x — w 


S 


1.68 X 2- 51 


1.80 X 2" 52 


15924 


— 


D 


1.52 X 2- 81 


1.14 X2 -57 


2106 


987 


—20 ^ x < 
7T < x ^ 20 


S 


1.76 X 2" 80 


1.26 X 2" i7 


16034 


— 


D 


1.63 X 2~ B2 


1.33 X 2" 40 


2202 


1025 



Figure 38. FDSC, Double-Precision Cosine Function 

Note: The sample arguments upon which the statistics in Figures 
36, 37, and 38 are based, were uniformly distributed over the speci- 
fied ranges. 



Double-Precision Arctangent — FDAT 
Algorithm 

1. Reduce the general case to — x — 1 by using the 
formulas: 

arctan(— x) = — arctan(x), arctan (-, — ,) = -|- — arctan(| x |). 

2. Then reduce the case further to | x | —tan (15°) 
by using 

arctan(x) = 30° -farctan ( a/3 — — -=) > 
\ x+\/3 / 

if tan(15°)<x<tan(45°). 

3. For the basic range | x | ^ tan (15°), a continued 
fraction approximation of the following form is used: 

arctan(x) 



This formula can be derived by transforming the 
Taylor series into the continued fraction 



arctan(x) 



= 1 - 



(3X4) 

(5 2 )(7) 



T + x " 



23 



(5)(9) 



+ x- 



(5 2 )(2*) 
(7)(9 2 )(11) 



(4)(7 2 )(3) 
(5)(11)(13 2 ) 



59 



+ x" 



(3)(37) 



+ x" 2 + w 



(9)(13) ^~ (13)(17) 

Then, as the approximation of the value w = w(x), use 
the value —0.00398, so that when the above fraction is 
rewritten with this value of id, the formula (*) is ob- 
tained. The maximum relative error of the formula (*) 
is less than 2 -55 . 

4. As a result of the subtraction involved in item 
2, computational errors are heaviest for the range 

tan(15°) — | x | — tan(45°), especially as | x | approaches 
tan (15°). 

5. datan2 provides the extended answer range 
— 7T— y— 77-, depending on the combination of the signs 
of the two arguments. 

Effect of an Argument Error 

E ~ ]■ x 2 • For small x, e~8; and as x increases, the 
effect of 8 on c decreases. 

Performance Statistics 



Argument 




Root-Mean-Square 

Relative Error 

<r(e) 


Maximum 

Relative Error 

M(e) 


Average 

Speed in 

Microseconds 


Range 


7040 


7044 


Full range 


S 


1.87 X lO' 18 


1.21 X 10 15 


18427 


D 


2.12 X 10" te 


1.29 X lO -10 


2375 


1137 


Note: The sample arguments upon which the above statistics are : 
based were tangents of random numbers uniformly distributed over : 

/ 7T IT \ 

the interval! — ~t ~ 1. (That is, the argument sample was such ; 

that the values returned by the function were uniformly distributed 
over the answer range.) 



Figure 39. FDAT, Double-Precision Arctangent Function 

Complex Square Root — FCSQ 
Algorithm 



1. Write \/x + iy =£ 4- i-q 

2. If x — +0, use $ = a/ j — ! l i_ 



V — 



U 



= 1 + 



ft + x 2 + |8, + x 2 + J9. + x 2 + p t + x 2 



C) 



3. If x ^ -0, use £=-£-, v = S gn(v) A^/— L U 

Thus, if x < 0, the case y = —0 is differentiated from 
the case y = +0. That is, 



Appendix A: Algorithms and Accuracy Considerations 25 



Vx-Oi = 
and 



lim 



lim 



+0 



Vx— ei 



Vx+« 



Vx+oi = e _^ +0 

4. If, in the foregoing computation, 

1 x 1 + 1 x + iy | . 9 _ 129 

then + Oi is given as the answer. 

5. The sqr and fcab subroutines are used by this 
subroutine. 

Note: If | x | + | x + iy | — 2 127 , floating-point overflow 
occurs. 

Effect of an Argument Error 

If we write x + iy = r(e lh ) and Vx" + iy = R( iH ), then 
£ (R) ~ L 8(r) and e(H) - 8(h). 

Performance Statistics 



Argument 
Range 



io- 



xi+ix 2 | ^ 10 s7 



Root-Mean-Square Maximum 
Relative Error Relative Error 
<r(e) M(e) 



5.06 X 10" 



1.56 X 10 _ 



Average 

Speed in 

Microseconds 



7040 



2413 



7044 



849 



Note: The distribution of sample arguments upon which the above 
statistics are based is exponential radially and uniform around the 
origin. 



Figure 40. FCSQ, Complex Square Root Function 



Complex Exponential — FCXP 

Algorithm 

1. e x+ly = e x [cos(y)] + i[e x sin(y)] 

2. The xpn and scn subroutines are used by this sub- 
routine. 

Effect of an Argument Error 

If we write &+* = R(e 1H ), then H = y and e(R) ~ A(x). 

Performance Statistics 



Argument 
Range 


Root-Mean-Square 
Relative Error 

<r(e) 


Maximum 
Relative Error 

M(e) 


Average 

Speed in 

Microseconds 


7040 


7044 


| xi |^85, |x 2 |^10 


9.34 X 10' 9 


2.58 X lO' 8 


3885 


1528 


Note: The sample arguments upon which the above statistics are 
based were uniformly distributed over the specified range. 



Figure 41. FCXP, Complex Exponential Function 
26 



Complex Natural Logarithm — FCLG 

Algorithm 

1. Write log(x + iy) = £ + i v . Then, $ = log | x + iy |, 

and 7j = arctan (— ) (in the sense of atan2). 
x 

2. This routine differentiates the argument x - Oi 
from the argument x + Oi. That is, 

lim 



log(x-0i)= 1 ™ +0 iog(x-a) 



log(x + 0i)= ^ +0 teg(x + d) 

3. Subroutines fcab, atn, and log are use by this 
subroutine. 
Note: If | x 4- iy | — 2 127 , floating-point overflow occurs. 

Effecf of an Argument Error 

When we write x + iy = r(e ih ) and log(x + iy) = £ + iri, 
we have h = v and E(£) ~ 8(r). When the argument is 
near 1 + Oi, the answer is almost zero; therefore a small 
8 can cause a large c. 

Performance Statistics 



Argument 
Range 


Root-Mean-Square 
Relative Error 

<r(e) 


Maximum 

Relative Error 

M(e) 


Average 

Speed in 

Microseconds 


7040 


7044 


Full range, except 

the immediate 

neighborhood of 

1 + Oi 


3.59 X lO" 9 


1.14 X 10 s 


4406 


1656 


Note: the distribution of sample arguments upon which the above 
statistics are based is exponential radially and uniform around the 
origin. 



Figure 42. FCLG, Complex Natural Logarithm Function 



Complex Sine/Cosine — FCSC 

Algorithm 

1. sin(x + iy) = [sin(x)] [cosh(y)] 

+ [i] [cos(x)] [sinh(y)] 
cos(x + iy) = [cos(x)] [cosh(y)] 

- [i] [sin(x)] [sinh(y)] 

2. Subroutines scn and scnh are used by this sub- 
routine. 

Effecf of an Argument Error 

Combine the effects of the sine and cosine functions 
of the scn subroutine and the sinh and cosh functions 
of the scnh subroutine according to the algorithm in 
item 1 above. 



Performance Statistics 



Argument 
Range 


Root-Mean-Square 

Relative Error 

<r(e) 


Maximum 
Relative Error 

M(e) 


Average 

Speed in 

Microseconds 


7040 


7044 


1 xi 1 — 10/ 

1 X2 1 — 1 


1.68 X 10" 8 


(4.15 X 10" 8 )* 


5548 


2129 


* The maximum relative error cited here is based on a set of 2500 
random samples uniformly distributed over the range. In the im- 
mediate neighborhood of the points mr + 0i (where n = ±1, ±2, 
. . . ,) the relative error can be quite high in spite of a small 
absolute error at these points. As x 2 remains constant and | xi | 
increases, the relative error increases; as xi remains constant and 
1 X2 | increases, the relative error remains substantially the same. 



Figure 43. FCSC, Complex Sine Function 



Argument 
Range 



xi | ^ 10 
x 2 | < 1 



Root-Mean-Square 

Relative Error 

<r(e) 



1.70 X 10" 



Maximum 
Relative Error 

M(e) 



(4.12 X 10" 8 )* 



Average 

Speed in 

Microseconds 



7040 7044 



5498 



2119 



* The maximum relative error cited here is based on a set of 2500 
random samples uniformly distributed over the range. In the im- 
mediate neighborhood of the points (n + i)rr + Oi (where 
n = 0, ±1, ±2,.. . ,) the relative error can be quite high in spite 
of a small absolute error at these points. As X2 remains constant and 
| xi | increases, the relative error increases; as xi remains constant 
and | X2 | increases, the relative error remains substantially the same. 



Complex Arithmetic — FCA 
Algorithm 

1. (a + bi) ± (c + di) = (a 

2. (a + bi) (c + di) = (ac - 

3. Iflcl^ldl. 



± c) + (b ± d)i 
bd) + (ad + bc)i 



(a+bi) _ a/c + bd/c 2 - ad/c 2 + b/c , 
(c+di) ~ 1 + (d/c) 2 + 1 + (d/c) 2 (1) 

If | d | > | c |, then reduce to the above ease by 
transforming 

(a+bi) _ (b-ai) 
(c+di) (d-ci) 

Note: When the magnitude of the result is close to Q, 
a floating-point overflow is possible. When the magni- 
tude of the result is close to the underflow threshold, 
the accuracy of the answer diminishes. 

Performance Statistics 



Argument 
Range 


Root-Mean-Square 

Relative Error 

*(e) 


Maximum 
Relative Error 

M(e) 


Average 

Speed in 

Microseconds 


7040 


7044 


For pairs of operands 
that are away from 

the overflow or 
underflow thresholds 


1.09 X 10" 8 


2.62 X 10" 8 


883 


305 



Figure 46. FCA, Complex Multiplication Function 



• Figure 44. FCSC, Complex Cosine Function 



Complex Absolute Value — FCAB 
Algorithm 

1. If | x | ^ | y |, | x + yi | = | x | VI + (y/x) 2 + Oi 

2. If | y | > | x |, | x + yi | = | y | VI + (x/y) 2 + (H 

3. The sqr subroutine is used by this subroutine. 
Note: If the result is greater than Q, then floating-point 
overflow occurs. 

Performance Statistics 



Argument 
Range 


Root-Mean-Square 

Relative Error 

<r(e) 


Maximum 

Relative Error 

M(e) 


Average 

Speed in 

Microseconds 


7040 


7044 


lO-^xi+ixal^lO 37 


6.71 X 10" 9 


2.29 X 10 -8 


1212 


436 


Note: the distribution of sample arguments upon which the above 
statistics are based is exponential radially and uniform around the 
origin. 



Argument 
Range 


Root-Mean-Square 

Relative Error 

<r(e) 


Maximum 
Relative Error 

M(e) 


Average 

Speed in 

Microseconds 


7040 


7044 


For pairs of operands 
away from the over- 
flow or underflow 
thresholds 


1.3 X I0" 8 


3.23 X 10~ 8 


1444 


554 



Figure 45. FCAB, Complex Absolute Value Function 



Figure 47. FCA, Complex Division Function 

Note: The distribution of sample operands upon which the statistics 
in Figures 46 and 47 are based is exponential radially and uniform 
around the origin. 



MTN Subroutine 

Purpose 

The purpose of the mtn subroutine is to modify the 
error control of the tnct (single-precision tangent/co- 
tangentjsubroutine. 
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Usage and Effect 

The calling sequence for the mtn subroutine is 

CALL MTAN(k) 
When this calling sequence is used with k = 0, 1, 2, 3, 
4, or 5, the tnct subroutine is modified to give a mini- 
mum relative accuracy guarantee of l/(2 k + 2 -1) for 
correctly rounded arguments near the singularities. If 
this calling sequence is used with k greater than 5, 
tnct is modified to suspend the above accuracy guar- 
antee feature completely. In this case, any argument 
less than 2 20 in magnitude (and greater than 2 -126 for 
the cotangent function) will be accepted. 

Modifications remain in effect until mtn is called 
again. 



Note: A map programmer need not use the mtn sub- 
routine to reset the accuracy guarantee. He can ac- 
complish the same effect by including the following 
coding in his program. 

1. To modify the guarantee 



CLA 
ALS 
STO 



= 1 

k 

CRIT. 



where 



k = 0,1,...,5 



2. To eliminate the guarantee 
STZ CRIT. 
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Appendix B: Storage Requirements 



The following chart shows the octal and decimal stor- 
age requirements for the Fortran iv Mathematical 
Subroutines. 



SUBROUTINE 

ARSCN 

ATN 

ERF 

FCA 

FCAB 

FCLG 

FCSC 

FCSQ 

FCXP 

FDAT 

FDLG 

FDMD 



STORAGE 


REQUIREMENTS 


OCTAL 


DECIMAL 


117 


79 


230 


152 


136 


94 


131 


89 


44 


36 


45 


37 


110 


72 


55 


45 


63 


51 


231 


153 


166 


118 


60 


48 



SUBROUTINE 

FDSC 

FDSQ 

FDX1 

FDX2 

FDXP 

GAMA 

LOG 

MTN 

SCN 

SCNH 

SQR 

TNCT 

TNH 

XP1 

XP2 

XP3 

XPN 



STORAGE REQUIREMENTS 
OCTAL DECIMAL 



215 


141 


75 


61 


210 


136 


72 


58 


174 


124 


245 


165 


145 


101 


20 


16 


157 


111 


100 


64 


100 


64 


171 


121 


76 


62 


73 


59 


67 


55 


51 


41 


106 


70 
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Appendix C: Error Messages 



Listed below are the off-line error messages corre- 
sponding to the Fortran iv Mathematical Subroutine 
Library. The last two digits of the number at the left 
of each message is the error code. The error code iden- 
tifies the type of optional return that was taken when 



the error was detected. For example, error message 
10709 signifies that the optional return identified by 
error code 9 was taken. For further information on 
error codes and optional returns see the section "Role 
of the Execution Error Monitor." 



10701 

10702 

10703 

10704 

10705 

10707 

10708 

10709 

10710 

10711 

10712 

10713 

10714 

10715 

10716 

10717 

10718 

10719 

10720 

10721 

10722 

10724 

10725 

10726 

10727 

10728 



0**0 INVALID ARGUMENTS 

0**-X INVALID ARGUMENTS 

TAN-COT(X) WHERE I X I GT OR EQ TO 2**20 

TAN-COT(X) WHERE X TOO CLOSE TO A SINGULARITY 

SINH-COSH(X) WHERE | X | GREATER THAN 88.029692 

-B**C WHERE C IS REAL 

EXP(X) WHERE X GREATER THAN 88.029692 

ATAN2(0,0) INVALID ARGUMENTS 

LOG(0) INVALID ARGUMENT 

LOG(-X) INVALID ARGUMENT 

SIN-COS(X) WHERE | X | GT OR EQ TO 2**25 

ARSIN-ARCOS(X) WHERE | X | GREATER THAN 1 

SQRT(-X) INVALID ARGUMENT 

CEXP(Z) WHERE Z REAL PART GREATER THAN 88.029692 

CEXP(Z) WHERE | Z | IMAG PART GT OR EQ TO 2**25 

CLOG(0) INVALID ARGUMENT 

CSIN-CCOS(Z) WHERE I Z I IMAG PART GT OR EQ TO 88.029692 

CSIN-CCOS(Z) WHERE | Z | REAL PART GT OR EQ TO 2**25 

GAMMA(X) WHERE X IS -, NEAR 0, OR GT OR EQ TO 34.843 

ALGAMA(X) WHERE X IS NON-POSITIVE OR GREATER THAN 1.54926*2**120 

DATAN2(0,0) INVALID ARGUMENTS 

DEXP(X) WHERE X GREATER THAN 88.029692 

DLOG(0) INVALID ARGUMENT 

DLOG(-X) INVALID ARGUMENT 

DSIN-DCOS(X) WHERE | X | GT OR EQ TO (2**50)PI 

DSQRT(-X) INVALID ARGUMENT 
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Absolute Error 15 

Maximum 15 
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Algorithm 15 
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arscn Subroutine 9 

Algorithm 19 
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Size 29 

Use 9 

atn Subroutine 9 

Algorithm 18 

Performance 18 

Size 29 

Use 9 

Calling Sequence 5 

Complex Arguments 5 

Complex Subroutines 13 

CRIT 28 

Double-Precision 5 

Arguments 5 

Simulation 6 

Subroutines 11, 12 

Entry Point 5 

erf Subroutine 10 

Algorithm 21 

Performance 21 

Size 29 

Use 10 

Error 6, 7, 15 

Absolute 15 

Relative 15 

Error Code 6, 30 
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Error Messages 6, 30 

Execution Error Monitor (xem) 6 

fca Subroutine 13 

Algorithm 27 

Performance 27 

Size 29 

Use 13 

fcab Subroutine 13 

Algorithm 27 

Performance 27 

Size 29 

Use 13 

fclg Subroutine 13 

Algorithm 26 

Performance 28 

Size 29 

Use 13 

fcsc Subroutine 13 

Algorithm 26 



Performance 27 

Size 29 

Use 13 

fcsq Subroutine 13 

Algorithm 25 

Performance 26 

Size 29 

Use 13 

fcxp Subroutine 13 

Algorithm 26 

Performance 26 

Size 29 

Use 13 

fdat Subroutine 12 

Algorithm 25 

Performance 25 

Size 29 

Use 12 

fdmd Subroutine 14 

Size 29 

Use 14 

fdlg Subroutine 12 

Algorithm 23 

Performance 23, 24 

Size 29 

Use 12 

fdsq Subroutine 11 

Algorithm 23 

Performance 23 

Size 29 

Use 11 

fdxp Subroutine 11 

Algorithm 22 

Performance 22, 23 

Size 29 

Use 11 

fdxI Subroutine 11 

Size 29 

Use (Entry Point cxpI. ) 13 

Use (Entry Point dxpI. ) 11 

fdx2 Subroutine 11 

Size 29 

Use 11 

Floating-Point Overflow 6 

Floating-Point Trap Supervisor ( fpt ) 6,7 

Floating-Point Underflow 6 

Function 5, 6 

gama Subroutine 10 

Algorithm 21, 22 

Performance 22 

Size 29 

Use 10 

log Subroutine 10 

Algorithm 19 

Performance 19, 20 

Size 29 

Use 10 

MATOP 6 

mtn Subroutine 14 

Purpose 27 

Size 29 

Use 14, 28 

Option Control Bits 6 

Resetting of 6 
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Optional Return 6 

Relative Error 15 

Maximum 15 

Root-Mean-Square 15 

scn Subroutine 9 

Algorithm 16, 17 

Performance 17 

Size 29 

Use 9 

scnh Subroutine 10 

Algorithm 20 

Performance 20 

Size 29 

Use 10 

Single-Precision Subroutines 8, 9 

Simulation of Double-Precision Computations 6, 7 

sqr Subroutine 8 

Algorithm 16 

Performance 16 

Size 29 

Use 8 

Storage Requirements 29 

tnct Subroutine 9 



Algorithm 17 

Performance 18 

Relative Error Control 17 

Size 29 

Use 9 

tnh Subroutine 10 

Algorithm 20 

Performance 21 

Size 29 

Use 9 

xpn Subroutine 8 

Algorithm 16 

Performance 16 

Size 29 

Use 8 

xpI Subroutine 8 

Size 29 

Use 8 

xp2 Subroutine 8 

Size 29 

Use 8 

xp3 Subroutine 8 

Size 29 

Use 8 
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